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^ An efficient resource-constrained g lobal scheduiing technique for supcorscaiar and VLIW processors 
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Clustering Is a common technique to overcome the wire delay problem incurred by the evolution of technolog 
distributed architectures, where the register file, the functional units and the data cache are partitioned, are p 
effective to deal with these constraints and besides they are very scalable. In this paper effective Instruction s 
techniques for a clustered VLIW processor with a word-interleaved cache are proposed. Such scheduling techn 
(i) loop unro ... 
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This paper describes and evaluates three architectural methods for accomplishing data parallel computation in 
programmable embedded system. Comparisons are made between the well-studied Very Long Instruction Wo 
Single Instruction /Multiple Packed Data (SIMpD) paradigms; the less-common Single Instruction Multiple Disj 
(SIMdD) architecture is described and evaluated. A taxonomy is defined for data-level parallel arch! ... 
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PICO is a fully automated system for designing the architecture and the microarchitecture of VLIW and EPIC p 
serious concern with this class of processors, due to their very long instructions, is their code size. One focus 
to describe a series of code size minimization techniques used within PICO, some of which are applied during 
design of the instruction format, while others are applied during program assembly. The design of a retargeta 

Keywords: EPIC, VLIW, code size minimization, custom templates, design automation, instruction format de 
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Site 

A virtual instruction set architecture (V ISA) Implementedvia a processor-specific software translation layerca 
flexibility to processor designers. Recentexamples such as Crusoe and DAISY, however, haveused existing ha 
instruction sets as virtual ISAs,which complicates translation and optimization. In fact,there has been little res 
specific designs for a virtuallSA for processors. This paper proposes a novel virtuallSA (LLVA) and a translatio 
implementi ... 
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Full text available: '^pdf(.159.fy1Bi Additional Information: MLcjtatjon, abstrsct, citings, indexiejms 

Very Long Instruction Word (VLIW) architectures were promised to deliver far more than the factor of two or 
current architectures achieve from overlapped execution. Using a new type of compiler which compacts ordina 
code into long instruction words, a VLIW machine was expected to provide from ten to thirty times the perfor 
conventional machine built of the same Implementation technology. Multlflow Computer, Inc., has now built a 
TRACE™^ 


^ A Stud y on the number o f memory ports i n multtpie instruction issue machines 

Soo-Mook Moon, Kemal Ebcioglu 

December 1993 Proceedings of the 26th annual international symposium on Microarchitecture 

Full text available: '^.pdf(1,2.3 MB) Additional Information: yj.cjtatloa, .reie.rences, citings 
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Andrei Terechko, Erwan Le Thenaff, Henk Corporaal 

October 2003 Proceedings of the international conference on Compilers, architectures and synthesis f 
systems 

Full text available: '^.p.df(MQ.34 KB) Additional Infornnation: fM.cjtatiQr;, abstract, references, index.teriTis 

In this paper high-level language (HLL) variables that are alive in a whole HLL function, across multiple sched 
termed as global values. Due to their long live ranges and, hence, large impact on the schedule, the global va 
different compiler optimizations than local values, which span across only one scheduling unit. The Instruction 
clustered ILP processor, which is responsible for cluster assignment of operations and variables, faces a difficu 

Keywords: ILP, VLIW, cluster assignment, compiler, Instruction scheduler, register allocation 
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Nikos P. PItsianis, Gerald G. Pechanek 

March 2003 ACM SIGARCH Computer Architecture News, volume 3i issue i 

Full text available: odf(623.03 KB) Additional Information: full citation , abstract , referenc es, index term s 

The indirect very long instruction word (iVLIW) architecture and its implementation on the BOPS ManArray fam 
multiprocessor digital signal processors (DSP) provides a scalable alternative to the wide instruction busses u 
a multiprocessor VLIW DSP. The ManArray processors indirectly access VLIWs from small caches of VLIWs loc 
processing element. With this work, we present an algorithm to perform 1) IVLIW instruction memory allocati 
processing el ... 

° EfTllslQyIng register 
R. Gupta 

February 1990 ACM SIGPLAN Notices , Proceedings of the second ACM SIGPLAN symposium on Principl 

of parallel programming, volume 25 issue 3 
Full text available: "^pdf(,1,.14,MB) Additional Information: full. citation, abstn?ic^, refererices, citings, index terms 

A multiprocessor system capable of exploiting fine-grained parallelism must support efficient synchronization 
mechanisms. This paper demonstrates the use of shared register channels as the communication mechanism 
processors in a multiprocessor chip. A register channel is provided with a synchronization bit that is used to e 
processor succeeds in reading a channel only after the channel has been written to. In contrast to a VLIW ma 
with channe ... 

An instructiorvlevei performance analysis of the Muitiflovv TRACE 14/300 
Michael A. Schuette, John P. Shen 

September 1991 Proceedings of the 24tii annual international symposium on Microarchitecture 

Full text available: '^.pdf(l,llMB.) Additional Information: fu[[ cjtaticn, references citings, index jsrms 


^2 Compiler tra nsforma tions for hs gh-performance computing 
David F. Bacon, Susan L. Graham, Oliver J. Sharp 
December 1994 ACM Computing Surveys (CSUR), Volume 26 issue 4 

Full text available: '^jxif(6,32 JVIB) Additional Information: MLcitatior^, abslract, references, citings, index.tenris, reyje... 

In the last three decades a large number of compiler transformations for optimizing programs have been impi 
optimizations for uniprocessors reduce the number of instructions executed by the program using transformat 
the analysis of scalar quantities and data-flow techniques. In contrast, optimizations for high-performance sup 
vector, and parallel processors maximize parallelism and memory locality with transformations that rely on tr 
properties o ... 

Keywords: compilation, dependence analysis, locality, multiprocessors, optimization, parallelism, superscala 
vectorization 
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Sachin V. Chitnis, Manoranjan Satpathy, Sundeep Oberoi 

March 1995 ACM SIGPLAN Notices , Papers from the 1995 ACM SIGPLAN workshop on Intermediate 

representations, volume so issue 3 
Full text available: 'g.p:df(,1,Q1..MBj Additional Information: MLcitatjon, abstrsGt, jr:dex.terryis 

The declarative nature of functional programming languages causes many difficulties in their efficient implem 
conventional machines. The problem is much harder when the language has non-strict (lazy) semantics. Abst 
serve as an intellectual aid In bridging the semantic gap between such languages and the conventional von Ne 
architecture. However they become more and more complex with time as efficiency considerations force the i 
the machine to gr ... 

Keywords: abstract machines, compiling and optimizations, control flow analysis, functional programming 
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Rajiv A. Ravindran, Robert M. Senger, Eric D. Marsman, Ganesh S. Dasika, Matthew R. Guthaus, Scott A. Mahike 
Brown 

October 2003 Proceedings of the international conference on Compilers, architectures and synthesis f 
systems 

Full text available: '^p.df{45jlZl.!<^^^^ Additional Information: MLQjtatjor:, abstrA^ct, reMejj.Q.$.5., jncl^x terms 

Low-power embedded processors utilize compact instruction encodings to acliieve small code size. Instruction 
bits are common. Such encodings place tight restrictions on the number of bits available to encode operand s 
thus on the number of architected registers. The central problem with this approach is that performance and 
sacrificed as the burden of operand supply is shifted from the register file to the memory due to the limited n 
register ... 

Keywords: embedded processor, graph partitioning. Instruction encoding, low-power, register window, windo 
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ins truction path coproces sor s 

Yuan Chou, John Paul Shen 

May 2000 ACM SIGARCH Computer Architecture News , Proceedings of the 27th annual internation 

on Computer architecture, volume 28 issue 2 
Full text available: |)df( 134.64 KB) Additional Infornnation: lull citatior; . abstrisict , references, citings. Index terms 

This paper presents the concept of an Instruction Path Coprocessor (I-COP), which is a programmable on-chip 
with its own mini-instruction set, that operates on the core processor's instructions to transform them into an 
that can be more efficiently executed. It is located off the critical path of the core processor to ensure that it d 
negatively impact the core processor's cycle time or pipeline depth. An I-COP is highly versatile and can be us 

Register connectio n: a new a pp roach to addin g registers into ins truction set architectures 
Tokuzo Kiyohara, Scott Mahike, William Chen, Roger Bringmann, Richard Hank, Sadun Anik, Wen-Mei Hwu 
May 1993 ACM SIGARCH Computer Architecture News , Proceedings of the 20th annual internation 
on Computer architecture, Volume 21 issue 2 

Full text available: ■^i.)cifQi)7.MB). Additional Infornnation: MLcitatjon, abstriict, reM\enc$s, citin^^s, indexMLnig. 

Code optimization and scheduling for superscalar and superpipelined processors often increase the register re 
programs. For existing instruction sets with a small to moderate number of registers, this increased register r 
be a factor that limits the effectlvess of the compiler. In this paper, we introduce a new architectural method 
of extended registers into an architecture. Using a novel concept of connection, this method allows the data s 
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Rajiv Gupta 
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This paper discusses the use of shared register channels as a data exchange mechanism among processors In 
MIMD system with a load/store architecture. A register channel is provided with a synchronization bit that is u 
that a processor succeeds in reading a channel only after a value has been written to the channel. The instruc 
by this load/store architecture allow both registers and register channels to be used as operand sources and r 

Keywords: aliasing, channels, fine-grained parallelism, instruction scheduling, multiprocessor system, paralle 


20 Tolera ting d ata access latency with re gister preioadin g 

William Y. Chen, Scott A. Mahike, Wen-mei W. Hwu, Tokuzo Kiyohara, Pohua P. Chang 
August 1992 Proceedings of the 6th international conference on Supercomputing 

Full text available: "^.od 1x970,85 KB} Additional Information: fiill. citation, abstract, references, citings, index.ternis 

By exploiting fine grain parallellsnn, superscalar processors can potentially increase the performance of future 
supercomputers. However, supercomputers typically have a long access delay to their first level memory whic 
restrict the performance of superscalar processors. Compilers attempt to move load instructions far enough a 
latency. However, conventional movement of load instructions is limited by data dependence analysis. This pa 
simple har ... 

Keywords: VLIW/superscalar processor, data dependence analysis, load latency, register file, register preloa 
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